Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking

نویسندگان

چکیده

In many existing multi-agent reinforcement learning tasks, each agent observes all the other agents from its own perspective. addition, training process is centralized, namely critic of can access policies agents. This scheme has certain limitations since every single only obtain information neighbor due to communication range in practical applications. Therefore, this paper, a distributed deep deterministic policy gradient (MAD3PG) approach presented with decentralized actors and critics realize tracking. The distinguishing feature proposed framework that we adopted execution, where takes agent’s agents’ into account. Experiments were conducted tracking tasks based on particle environments N(N=3,N=5) track target partial observation. results showed method achieves higher reward shorter time compared methods, including MADDPG, DDPG, PPO, DQN. novel leads more efficient effective

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Deep reinforcement learning for multi-agent cooperation and competition has been a hot topic recently. This paper focuses on cooperative multi-agent problem based on actor-critic methods under local observations settings. Multi agent deep deterministic policy gradient obtained state of art results for some multi-agent games, whereas, it cannot scale well with growing amount of agents. In order ...

متن کامل

Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches....

متن کامل

Planning for Partially Observable, Non-Deterministic Domains

From 12.06.05 to 17.06.2005 the Dagstuhl Seminar 05241 Synthesis and Planning was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas ...

متن کامل

Deep Deterministic Policy Gradient for Urban Traffic Light Control

Traffic light timing optimization is still an active line of research despite the wealth of scientific literature on the topic, and the problem remains unsolved for any non-toy scenario. One of the key issues with traffic light optimization is the large scale of the input information that is available for the controlling agent, namely all the traffic data that is continually sampled by the traf...

متن کامل

Learning Partially Observable Deterministic Action Models

We present the first tractable, exact solution for the problem of identifying actions’ effects in partially observable STRIPS domains. Our algorithms resemble Version Spaces and Logical Filtering, and they identify all the models that are consistent with observations. They apply in other deterministic domains (e.g., with conditional effects), but are inexact (may return false positives) or inef...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Actuators

سال: 2021

ISSN: ['2076-0825']

DOI: https://doi.org/10.3390/act10100268